Missing Data as a Causal and Probabilistic Problem
نویسندگان
چکیده
Causal inference is often phrased as a missing data problem – for every unit, only the response to observed treatment assignment is known, the response to other treatment assignments is not. In this paper, we extend the converse approach of [7] of representing missing data problems to causal models where only interventions on missingness indicators are allowed. We further use this representation to leverage techniques developed for the problem of identification of causal effects to give a general criterion for cases where a joint distribution containing missing variables can be recovered from data actually observed, given assumptions on missingness mechanisms. This criterion is significantly more general than the commonly used “missing at random” (MAR) criterion, and generalizes past work which also exploits a graphical representation of missingness. In fact, the relationship of our criterion to MAR is not unlike the relationship between the ID algorithm for identification of causal effects [22, 18], and conditional ignorability [13].
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملGraphical Models for Recovering Probabilistic and Causal Queries from Missing Data
We address the problem of deciding whether a causal or probabilistic query is estimable from data corrupted by missing entries, given a model of missingness process. We extend the results of Mohan et al. [2013] by presenting more general conditions for recovering probabilistic queries of the form P (y|x) and P (y, x) as well as causal queries of the form P (y|do(x)). We show that causal queries...
متن کاملParametric and Nonparametric Regression with Missing X’s—A Review
This paper gives a detailed overview of the problem of missing data in parametric and nonparametric regression. Theoretical basics, properties as well as simulation results may help the reader to get familiar with the common problem of incomplete data sets. Of course, not all occurences can be discussed so this paper could be seen as an introduction to missing data within regression analysis an...
متن کاملA Tutorial on Learning with Bayesian Networks
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to ...
متن کاملA method to solve the problem of missing data, outlier data and noisy data in order to improve the performance of human and information interaction
Abstract Purpose: Errors in data collection and failure to pay attention to data that are noisy in the collection process for any reason cause problems in data-based analysis and, as a result, wrong decision-making. Therefore, solving the problem of missing or noisy data before processing and analysis is of vital importance in analytical systems. The purpose of this paper is to provide a metho...
متن کامل